Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment

نویسنده

  • Thomas Schoenemann
چکیده

We derive variants of the fertility based models IBM-3 and IBM-4 that, while maintaining their zero and first order parameters, are nondeficient. Subsequently, we proceed to derive a method to compute a likely alignment and its neighbors as well as give a solution of EM training. The arising M-step energies are non-trivial and handled via projected gradient ascent. Our evaluation on gold alignments shows substantial improvements (in weighted Fmeasure) for the IBM-3. For the IBM4 there are no consistent improvements. Training the nondeficient IBM-5 in the regular way gives surprisingly good results. Using the resulting alignments for phrasebased translation systems offers no clear insights w.r.t. BLEU scores.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving IBM Word Alignment Model 1

We investigate a number of simple methods for improving the word-alignment accuracy of IBM Model 1. We demonstrate reduction in alignment error rate of approximately 30% resulting from (1) giving extra weight to the probability of alignment to the null word, (2) smoothing probability estimates for rare words, and (3) using a simple heuristic estimation method to initialize, or replace, EM train...

متن کامل

Normalizing German and English Inflectional Morphology to Improve Statistical Word Alignment

German has a richer system of inflectional morphology than English, which causes problems for current approaches to statistical word alignment. Using Giza++ as a reference implementation of the IBM Model 1, an HMMbased alignment and IBM Model 4, we measure the impact of normalizing inflectional morphology on German-English statistical word alignment. We demonstrate that normalizing inflectional...

متن کامل

A New Approach for English-Chinese Named Entity Alignment

Traditional word alignment approaches cannot come up with satisfactory results for Named Entities. In this paper, we propose a novel approach using a maximum entropy model for named entity alignment. To ease the training of the maximum entropy model, bootstrapping is used to help supervised learning. Unlike previous work reported in the literature, our work conducts bilingual Named Entity align...

متن کامل

Lexicon models for hierarchical phrase-based machine translation

In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and—given the word alignment matrix—relies on pure relative frequencies [1]; the IBM model 1 lexicon [2]; a regularized version of IBM model 1; a triplet lexicon model variant [3]; and a disc...

متن کامل

Reducing Parameter Space for Word Alignment

This paper presents the experimental results of our attemps to reduce the size of the parameter space in word alignment algorithm. We use IBM Model 4 as a baseline. In order to reduce the parameter space, we pre-processed the training corpus using a word lemmatizer and a bilingual term extraction algorithm. Using these additional components, we obtained an improvement in the alignment error rate.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013